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Abstract 

A generalized Davenport-Schinzel sequence is one over a finite alphabet that contains no subsequences 
isomorphic to a fixed forbidden subsequence. One of the fundamental problems in this area is bounding 
(asymptotically) the maximum length of such sequences. Following Klazar, let Ex(cr, n) be the maximum 
length of a sequence over an alphabet of size n avoiding subsequences isomorphic to a. It has been proved 
that for every a, Ex(cr, n) is either linear or very close to linear; in particular it is 0{n2^^^^ ^ where 
a is the inverse- Acker mann function and 0(1) depends on a. However, very little is known about the 
properties of a that induce superlinearity of Ex(cr, n). 

In this paper we exhibit an infinite family of independent superlinear forbidden subsequences. To 
be specific, we show that there are 17 prototypical superlinear forbidden subsequences, some of which 
can be made arbitrarily long through a simple padding operation. Perhaps the most novel part of our 
constructions is a new succinct code for representing superlinear forbidden subsequences. 



1 Introduction 

Standard Davenport-Schinzel sequences (or DS sequences) are those avoiding a fixed length alternating 
subsequence of the form abab • • • . The primary applications of these sequences are in bounding the complexity 
of geometric objects, particularly the lower envelopes of line segments or arbitrary functions with a bounded 
number of crossings; Agarwal and Sharir [2 have an excellent monograph on geometric applications of DS 
sequences. It is not difficult to prove that Ex(a6a6, n) = 9(n), though for longer forbidden subsequences 
the problem of asymptotically bounding the length of the longest DS sequence is not easy. A celebrated 
result of Hart and Sharir [12 showed that Ex(a6a6a, n) = Q{na{n)) where a is the slowly growing inverse of 
Ackermann's function. It follows that Ex((a6)^,n) and Ex((a6)^a,n) are also superlinear in n for all A: > 3, 
though how superlinear has still not been completely resolved. Agarwal, Sharir, and Shor [3 gave nearly 
tight bounds on the length of standard DS sequences: 



Ex((a6)^n) = n • 2®("(^) ^ 
1 ^.2^("(^)) 



Ex((a6)^a,n) 



k-2 

k-2 



A natural generalization of DS sequences is to consider arbitrary forbidden subsequences, not necessarily 
those of the form abab • • • Q At the moment we have only a limited understanding of how Ex(cr, n) could 
behave, and how it does behave for specific a. By generalizing the upper bounds of Agarwal et al. [3], 
Klazar showed that Ex(cr, n) = n • 2"(^)'^^^\ where the 0(1) depends on a. However, there are no 
commensurate lower bounds, i.e., no specific dc ^ ababa for which Ex(crc,n) > n ■ 2^"^^^^"^. (The notation 
X ^ y and x y mean, respectively, that x is and isn't isomorphic to a subsequence of y.) A more 
basic question — and the subject of this paper — is to identify the features of a forbidden subsequence a that 
cause Ex((j, n) to be superlinear. One can see that the set of all superlinear forbidden subsequences can be 
characterized by a unique set of minimal such forbidden subsequences. We define $ to be this set: 

Definition 1.1 ^ is the smallest set of sequences such that: 

Ex(cr, n) = uo{n) if and only if 3(j G <l> : a ^ a 



^ This idea was even suggested by Davenport and Schinzel; see [18] for a discussion of this. 
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It is amazing how little we know about Hart and Sharir's result [12] shows that ababa G $ and Adamec 
et al. [1] showed that no other sequence in {a, 6}* is in Klazar [T7 showed that ^ contains at least two 
elements: ababa and another which is currently unknown, but is a subsequence of abcbadadbcd. In other 
words, the presence of ababa ^ a is not the sole cause of superlinearity in Ex(<j, n). Klazar's result [17 is 
actually more general in that he shows that any a for which G{a) is strongly connected has Ex(cr, n) = u;{n), 
where G{a) is a digraph derived from the syntactic structure of a; see Figure [l] In other words, [17] raised 
the possibility that strong connectivity is the sole cause of superlinearity in generalized Davenport- Schinzel 
sequences. 




Figure 1: The digraph G{o-) has one vertex for each letter in the alphabet of a. Assuming that a is 
repetition-free, an edge (u^v) appears in G{a) if a contains as a subsequence either vuvu or uvvu. Left: 
G{ababa); Right: G (abcbadadbcd) . 



New Results. In this paper we introduce an infinite set ^ of independent superlinear forbidden subse- 
quences. The elements of ^ are not fundamentally different but, in fact, naturally divide themselves into 
just 17 categories. We call the simplest elements of each category the prototypes; the elements of ^ are 
ababa^ the prototypes, and an infinite number of sequences that can be derived from the prototypes through 
a padding operation. 

Why 17? There is no particularly good explanation for this number, except that it comes from a new 
compact notation we use for expressing forbidden subsequences. Whereas a forbidden subsequence is a 
string over an arbitrarily large alphabet, we can express elements of ^ as relatively short strings over the 
fixed alphabet {^, ^, O,^,^, )}* that follow some grammatical rules. Grammatical strings correspond to 
superlinear forbidden subsequences and there just happens to be 17 natural classes of grammatical strings. 

Our result refutes the possibility that strong connectivity is the cause of superlinearity and addresses an 
open question posed by Klazar [18], namely, is $ infinite or finite? We are able to show that |^| > 5, though 
we cannot identify any particular members of <l> except ababa. In Section [6] we discuss why the infinitude of 
^ supports the proposition that ^ is infinite. 

Related Work. Davenport-Schinzel sequences are part of a class of problems concerning combinatorial 
objects with forbidden substructures. Klazar [18 surveys generalizations of DS sequences to trees, permuta- 
tions, hypergraphs, 0-1 matrices, ordered digraphs, and partitions. Other examples of objects with forbidden 
substructures are matrices with the Monge property [6 and (partially defined) monotone matrices [H [151 HI] • 
Whereas the subject of this paper is finding the causes of superlinearity in DS-sequences, Adamec et al. [l] 
and Klazar and Valtr [19 looked for specific causes of linearity. In [1 it is shown that Ex(a66aa6, n) = 0(n), 
which implies that Ex{a^b^a^b^ ^n) = 0{n) as well, for any k. A corollary of this result is that ababa is the 
only two-symbol sequence in <l>. Klazar and Valtr [19 demonstrated that a few rules (resembling a context 
free grammar) suffice to generate a huge variety of linear forbidden subsequences. Specifically, let l> be the 
set of linear forbidden subsequences, i.e., those that do not contain some element of ^ as a subsequence. 
Klazar and Valtr showed that: 

G 4> a any symbol 

aia'^a2a e ^ =^ aiab^aa2ab^ G 4> 6 a symbol not appearing in aia'^a2a 
(J, cria^<j2 G 4> ^ <Jia(ja<j2 G 4> alphabet of a has no overlap with cria'^a2 



2 



2 Notation 



Let \a\ and ||cr|| be, respectively, the length of the sequence a and the number of distinct symbols in a. We 
say that a sequence a = {o'j)i<j<\a\ is a subsequence of E = if there exist \a\ indices ji < j2 < - • • < j\a\ 

such that ai = Sj.. Two sequences are isomorphic if they are identical up to renaming of symbols. We write 
cr^E and a ^ E to mean, respectively, that a is a subsequent of E and that a is isomorphic to a subsequence 
of E. A sequence E (or class of sequences) is a -free if a 7^ E. A sequence a = {cFj)j is c-regular if = aj 
implies \j — i\ > c. For instance, a 2-regular sequence has no immediate repetitions. 

Definition 2.1 Ex(<j, n) = max {|E| : a 7^ E and ||E|| = n and E is || a || -regular} 

The condition that E be 1 1 a || -regular simply rules out uninteresting sequences. For instance, the infinite 
sequence ababababa • • • is (a6c)-free, but in the least interesting way. 

A symbol refers to a member of the alphabet of a sequence and is distinguished from an occurrence of a 
symbol. For instance abbccbbc contains 3 symbols, with 1, 4, and 3 occurrences of a, 6, and c, respectively. 
In the text and figures we typically use the roman alphabet a, 6, c, . . . but in the proofs it is more convenient 
to use the natural numbers 1, 2, 3, — 

If u and V are vertices in a rooted tree, u <\v means that is a strict descendant of and u < v means u <\ 
V or u = V. A generalized path compression [12] is an operation that, given a sequence {ui^ ixi, 1^0)7 

where Ui < Ui-i ■ ■ ■ < ui < uq^ makes Ui,...,ui the children of uq but otherwise does not affect the 
structure of the given tree. (This definition differs from a standard path compression, where uq is the parent 
of 1^1, which is the parent of 2x2, an so on.) We frequently call this operation a path compression or simply 
a compression. We say the compression originates at Ui and terminates at uq. The vertices Ui^...^ui 
participate in the compression and the length of the compression is the number of participating vertices. 

3 Path Compression Systems 

Our construction of path compression systems follows the same lines as Hart and Sharir [121 [2] and Tarjan 
[25]. Given parameters j we construct a complete binary tree T(i, j) where nodes on each level of the tree 
are assigned an integer label. Based on this labeling we construct a sequence of j • |T(z, j)| path compressions, 
each with length i. The path compressions are then transcribed as a sequence S^^j which avoids ababa as 
well as an infinite number of other forbidden subsequences. For any trees T,T\ \T\ is the number of leaves 




Figure 2: Composition of trees in the construction of T{i^j). 
in T and T o T' (the composition of T and T') is derived by replacing each leaf of T' with a copy of T. 
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Clearly \ToT'\ — |T| • |T'|; see Figure |2j Let tj be a full binary tree with 2-^ leaves. The tree T{i,j) is defined 
recursively, as follows: 

T{l,j) = t, 

T{i^ 0) = tc {c > 1 is an arbitrary constant} 

T{i,j) = T{i,j-l)oT{i-l,\T{i,j-l)\) 



In other words, T{i^j) is the composition of tc and j trees of the form T{i — 1,-), each of which is the 
composition of tc and several trees of the form T{i — 2, • • • ) and so forth. The leaves of a tree T(i, •) are called 
i-nodes and are ordered from left to right. The internal nodes of any tj are 0-nodes. If is a leaf of T{i,j), 
Vkiu) refers to the kth {i — l)-node ancestor of u in T{i,j), where I < k < j. If u is the ^th leaf of Vkiu) in 
T{i^j) then /i/c(ii) = I. See Figurelslfor an illustration. We define a sequence of path compressions as follows. 




M(/-l)-node 
ancestor of x 



/i^(x)th leaf of v^{x) 



Figure 3: Illustration of vui^) and jiki^)- 

Each leaf of T{i^j) is the origin of j path compressions and the path compressions are performed in postorder 
by their point of origin. Let k) be the kih. path compression originating from a leaf x in T{i,j). If i = 1 
let C{x^k) = (x^Vkix)). For z > 1 let C{x^k) = x • C{i'k{x)^ fik{x)). We transcribe these path compressions 
into a sequence as follows. Each path compression and each vertex is assigned a distinct symbol and we 

write p < q if compression p precedes compression q. Let ^{x) be the path compressions that x participates 

c 

in, listed in decreasing order by <. Let Eij = ^{ui) • U2 • ^('^2) • U3 • ^(1/3) • • • iX2|T(i,j)|-i • ^('^2|T(i,i)|-i), where 
Uk is the kth vertex of T{i^j) in postorder. The symbol appearing in S^^j is distinct from the symbols 
of all path compressions. Since its only function is to enforce a regularity condition we will generally ignore 
these symbols. If i and j are understood or unimportant they will be omitted. For instance, T refers to Tij 
and the statements a ^ S and a 7^ S should be interpreted as a ^ S^^j for some z, j and a 7^ S^^j for all 



Lemma 3.1 just connects the length of S to the usual one- and two- argument versions of the inverse- 



Ackermann function. Its full proof is standard and tedious. 

Lemma 3.1 Letn=\\E.i^j\\ and I = \T{iJ)\. Then = n{na{n,l)). When j = 0{1), = n{na{n)) 



Lemma 3.2 just lets us ignore regularity issues. For instance, we might say that a 7^ S without bothering 



to explicitly mention that S is 1 1 a || -regular. Recall that c is the constant from the definition of T(z,0). 
Lemma 3.2 For i > 1 or j > c, is c-regular. 
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Proof: Let v be two vertices included in some path compression a, where v is ancestral to u. One can see 
from the recursive construction of T{i^j) and its associated path compressions that u and v are not 0-nodes 
and that they are at distance at least c. Therefore, there must be at least c symbols between consecutive 
occurrences of a in S. □ 



Lemma 3.3 is invoked repeatedly in order to simplify proofs and obtain contradictions. 



Lemma 3.3 For any two path compressions p < q, qpqp-/(B.. 



Proof: Let u^v^w^x be the vertices associated with the respective occurrences of q and p in the purported 
subsequence qpqp appearing in S. (Le., (^{u) and contain (^{v) and ^(x) contain p.) It follows from 

the inequality p < q that u < v <\ w < x] see Figure Ul One effect of the pih path compression is to make v 



possibly equal 



possibly equal 




after compression p 



terminal of p 




V mp X mp 

I I 

I I 

I I 



Figure 4: After compression p no compression can include both u and w. 

and X siblings and, as a direct consequence, to destroy the ancestor-descendant relationship between u and 
w. Therefore no subsequent path compression can include both u and w. Contradiction. □ 



Notice that Lemma 3.3 did not rely on any of the structure of S; the proof would go through if E were 



the transcription of any system of path compressions. One corollary of Lemma 3.3 is that S is (a6a6a)-free, 
which, by Lemma [3?T| implies that Ex(a6a6a, n) = Q{na{n)). This is one half of Hart and Sharir's proof [1^ 
that Eyi{ababa^n) = Q{na{n)). 

Lemma 3.4 Let u^v^w be vertices with u <\ v < w and suppose that p ^ £,{v) and q G £,{u),£_{w). If the 
compression p originates at a descendant of u then p G £,{u). 

Proof: First observe that for any compression (ii^, . . . , iio) in system, the intermediate vertices 



VHP 



f uaq 




Figure 5: The situation that causes p to make an ''implied^ appearance in £,{u). 

are uniquely determined by Ui and uq. In particular, Ui' is the first i' node on the path from to Ui. (In 
general any two vertices uniquely determine the whole compression.) Suppose that u is an i'-node and w an 
i^'-node, where i' > i" \ see Figure [sj It follows that u is the first i'-node on the path from w to the origin of 
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compression q. Since v < w and compression p originates below it also follows that u is the first i'-node 
on the path from v to the origin of p. Thus p ^ (,{u). □ 
Lemma |3.4| serves a simple purpose in our construction of forbidden subsequences. It says that aside 
from the explicit presence of g G £,{u)^£,{w) and p G £,{v)^ there must be an implied appearance of p G £,{u). 
We deliberately design forbidden subsequences that are {pqpqp)-fTee but nonetheless cause implied symbols 
to appear in inconvenient places, leading to implied occurrences of pqpqp. 



4 Encoding Forbidden Subsequences 

Our most compact representation of forbidden subsequences is as strings over the alphabet {^^^^ilt^^^-k}^ 
where some groups of letters may be parenthesized. Each letter (or parenthesized set of letters) represents 
one symbol in the associated forbidden subsequence. The symbols corresponding to 9, <0, and ^ are called 
the binder^ the guards the trap^ and the trapped. The roles of these symbols within the forbidden subsequence 
will be much easier to explain after analyzing a couple examples. 

Theorem 4.1 Ex{abcaccbc^ n) = Q{na{n)). 

Proof: Suppose that a = abcaccbc were to occur in S. By Lemma |3.3| we can eliminate all cases except 

c c 

a < b < c. Let Va^i be the vertex in T corresponding to the ith occurrence of a in a. It follows from 
the construction of E that 'i^a,!, '^6,1, '^c,i ^ '^a,2 and that Vc^2 < '^c,3 ^ '^6,2 < '^c,4- See Figure |6] We 




^c,2 possibly equal 




Figure 6: The vertex Vx^i G T is such that ^(v^,*) contains the symbol corresponding to the ith occurrence 
of X in (T. Dashed lines connect vertices that may be the same. 



apply Lemma 3.4 to the symbols b and c occurring in ^('^0,2), ^('^6, 2), and ^('^0,4) and conclude that b must 
also appear in ^('^0,2)- In other words, if abcaccbc appears in E then abcacbcbc appears as well. Since, by 
Lemma [33j S contains no subsequences isomorphic to ababa, it must also be a-free. Therefore, Ex(<j, n) > 
\E\ n{na{n)). □ 



Let us analyze the functions of a, 6, and c in the proof of Theorem 4.1 The symbol a did not appear in 
the ultimate contradiction (the implied subsequence bcbcbc) but it did facilitate the contradiction by forcing 
vjj^i and Vc,i to be descendants of '^c,2, '^c,3, '^6,2, and Vc^4. In our terminology a is the binder (symbolized by 
^) because it binds previous symbols (i.e., vertices in T) under one common ancestor. The locations of b 
and c were chosen with the preconditions of Lemma [3^ in mind. For the proof to go through we need c to 
appear in ^(^'0,2) and ^('^0,4) and b to appear in ^(^'5,2), and, crucially, that Vc,2 < '^6,2- This last condition 
is enforced by the immediate repetition of c in a. In our terminology c acts as a guard (making sure ^5^2 
is a strict ancestor of Vc,2) and both b and c are trapped by c, meaning that the symbols b and c appear at 
vertices that lie strictly above one occurrence of c and strictly below another occurrence of c. Guards, traps, 
and trapped symbols are represented by O,^, and ^, respectively. Thus, we can represent a as ^d|b(0^^)* 
a acts as a binder, 6 as a trapped symbol, and c as a guard, a trap, and a trapped symbol. It is not true 
that every string over {^, <0, ^, can be realized as a new superlinear forbidden subsequence. However, 
with only a few syntactic restrictions on the encoding we can show that each valid encoding corresponds to 
at most a constant number of forbidden subsequences, each of which is independent of the others. Before 
giving these restrictions we look at one more specific example. 
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Theorem 4.2 Ex{abcbdadbcd^n) = Q{na{n)). 



Proof: As before, suppose that a = abcbdadbcd S. It follows from Lemma 3.3 that a < b < c < d 
and from the construction of E! that '^a,!, '^6,2, '^d,! ^ '^a,2, that v^^i^Vc^i < and that Vd,2 < Vc,2 <l '^d,3- 
(applied to c and implies that c appears in ^(^'^,2); another application of Lemma 3.4 (now 



Lemma 



3.4 




A \ 



Vb,l ^c,l ^d,l 



Figure 7: Dashed lines connect vertices that may be the same. 

with c and b) implies that c also appears in ^(^5,2) 5 see Figure [?[ In other words, if a appears in S then 
cf' = abcbdadcbcd appears in E as well. This leads to a contradiction since bcbcbc cr', which, by Lemma [331 
can never appear in S. n 

In (J the appearance of c in ^('^0,2) is trapped by d and the implicit appearance of c in ^('^^,2) is trapped 
by b. The symbol a acts as a binder to insure that 'L'5^2 < '^d,2- The third occurrence of 6 in a acts as a 
guard to ensure that Vd^2 < '^c,2 and the second occurrence of d ensures that '^5^2 < Vd,2- We could encode 
a succinctly as 9(0A)^(0A) but it turns out that when there are two traps (by d and b in this case) they 
each act as guards for the other. There is no ambiguity in coding a as ^Jft^Jlt. 

Notice that a is very similar but shorter than the forbidden subsequence considered by Klazar [17] : 
cr' = abcbadadbcd. The relevant difference is that G{g') is strongly connected, which implies the superlinearity 



of Ex(cr',n) whereas G{cr) is not. Thus, the validity of Theorem 4.2 follows from different principles. 

Definition |4.3| uses the following regular expression notation: X* represents zero or more repetitions of 
X and [X, F, Z] represents exactly one of X, F, and Z. 

Definition 4.3 A string in {9, 0, ♦,♦,^5 (, )}* is a legal compact encoding if it is 

(1) 9*** (8-10) m^*^*9[(0**),(0*)*,0**] 

(2-4) 9*[(0**), (❖♦)*, (11-12) in ^0* ^* ♦ ^* 9* or ^ ♦ ^* ♦ ^* 9* 

(5) (13-14) m ^0* ^* 9** or ^ ♦ ^* 9** 

(^^) (15-16) in ^* [«>♦), ^* ^* 9* 



The legal strings from Definition 4.3 could be generated from an alternative set of rules which are equally 



unintuitive but may be helpful to keep in mind while reading the proofs. 



Definition 4.4 A string is a legal compact encoding if it is of the form given in Definition 4-3(1,7) or if it 
contains exactly one 9; (}, and Jft, two ^s, possibly an unbounded number of i^s, at most one parenthesized 
expression which is either (0^) or (<)♦♦); cind is subject to the following restrictions: 

(i) The final symbol must be ilt (iv) A or the ^ must precede both ^s 

(a) The must precede at least one ^ ('^) ^ (0^) cannot precede the other ^ 

(Hi) All i^s must precede the 9 (vi) The first two symbols that are either or ^ 

must have a 4 between them 
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ba 
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ababa 



Cb 
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t cb 
\dc 
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\ ed 
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abcdeaebdce 



abcdeaecdbe 



abcdebeadce 



abcdaedeccbe 



I 1 

abcdaeceddbe abcadeceddbe 




abcbdaedebce 




abcdeafefcdbf 



abcdbefdfaecf 
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eb 

fc 
d 



f 
dc 



ec 
•fb 
d 



iiec 
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Me jkM Jkd 

Jii^ 



abcdeafefbdcf 



{\eb 



abcdeafdfbecf 

i^d 



abcdaefdfbecf 

ff 

i eb 

yd 



jke jkd jkd jkc jkc jkc 

^ ^ A ^ A 



abcdeafdfcebf 



1 

abcdaefdfcebf 



abcdeafcfdebf abcdaefcfdebf 



abcadefcfdebf 



Figure 8: The 17 prototypes. Each prototype has a compact encoding (using the minimum number of ^s) 
and each encoding can be reahzed by at most a constant number of labeled trees, each of which corresponds 
to a forbidden subsequence. The vertex labels are indicated, except for the leaves. In each tree the leaves 
are labeled in left-to-right order: a, 6, c, — 



Most illegal strings still lead unambiguously to forbidden subsequences. We designate them illegal either 
because we cannot prove that they are superlinear, or because they are trivially superlinear, e.g., if they 
contain ahaha or another subsequence known to be superlinear. 

We describe below a two step process for converting a legal compact encoding from Definition |4.3| into a 
forbidden subsequence. The first step converts a compact encoding into a labeled tree representation. There 
may be more than one possible tree representation per compact encoding, though never more than four. 
The second step is to map a labeled tree into a forbidden subsequence; this mapping is always unique. See 
Figure [8] for a depiction of the 17 prototypes. Each is represented as a legal encoding, a set of 1 or more 
trees, and for each tree a corresponding forbidden subsequence. More complex forbidden subsequences can 
be derived by inserting ^s into the compact encoding. 

Let A be a legal compact encoding. An element is a symbol in {^, 0, ^} or a parenthesized sequence 
of symbols from that set. Let |A| be the number of elements in A and A(j) be the jth element. We generate 
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a tree tx from the bottom up as follows. We create |A| leaves where the jth leaf ^j is labeled j. (If A 
contains consecutive elements X{j)X{j + 1) = as in cases (1) or (7) from Definition 4.3, we create a 
new node labeled j that is the parent of £j and ^j+i- If this is the case, substitute £j for all references 
to ij and ij-^i below.) Suppose A(ji), . . . , A(j^_i) are the elements and X{ji) the element of A. We 
create / new internal nodes yi, . . . where yi is labeled ji. We make yi the parent of ij-^ and ij^, and in 
general, y^ is made the parent of yi-i and ^j^^i- Finally, yi is made the parent of £\x\' Figure [91 shows tx for 
A = ^*^2^^2^9*. 




ahcdefghi j 

abcadcefdgfhigjijehbj 



Figure 9: One example ta, for A = iP' (} iP dlb^Jlt. 

If a leaf Ij' has siblings to the left and right that are the children of a common parent yi then Ijf must 
also be a child of y^. Some leaves may be able to choose between one of two parents. For instance, if 
A = ^O^^^A then yi is the parent of and ^4, which forces ^2 and ^3 to be children of yi as well. We 
put y2 as the parent of yi and and have the freedom to make ^5 the child of either yi or ^2- See the 
last two trees on the third row of Figure [sj We create three nodes ancestral to yi named ^1,2:2,^3. Let 
be the index of the guard element of A, jf < be the indices of the trapped elements, and j* < j* the 
indices of the traps. (It is impossible for all t hese elements to be present simultaneously.) If A contains two 



traps (see cases (1) and (7) of Definition 4.3) we assign zi label ^2 label jf (there is no j^), and zs 

label j*. If A contains just one trap, namely j*, we assign Z2 label j^jf , zs label j*, and Zi label jfj^^ 
unless j* = j^, in which case zi is simply labeled j*. See Figure pi for the labeled tree representations of 
the prototypes. 

Once we settle on a particular labeled tree r, turning it into a sequence is relatively straightforward. Let 
cr(r) be the concatenation of the labels of the nodes of r in the (unique) postorder in which ii precedes £2, 
which precedes ^3, and so on. 

Definition 4.5 ^ is the set containing ababa and all sequences that can be generated from a legal compact 
encoding. 



5 General Superlinear Lower Bounds 

In this Section we prove our main result, that all forbidden subsequences that could be generated from a 
legal compact encoding are superlinear. 

Theorem 5.1 For all a G Ex(cr, n) = Q{na{n)) 

Proof: Let A be the legal compact encoding that generated a. Theorems |4.1| and |4.2| cover the case of 



A G {^d|b(0^A)7 ^♦♦♦l, i-e., parts (1) and (2) of Definition 4.3 It is straightforward, given the proof 
of Theorem |4.2| and what follows, to cover A G ^♦dlb ^* ^Jft. For notational simplicity we will omit this 
case. Thus A contains exactly one <0, and 9, two ^s, an unbounded number of ^s, and at most one 
parenthesized element, which is either (0^) or (<)♦♦). (It may be helpful at this point to reread the 



properties of legal encodings from Definition 4.4 ) Let A(ji), . . . , A(j^_i) be the ^ elements, X{ji) the 9, and 
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A(j^), A(jJ^), A(j^), and A(j*) the elements of type 0, ♦ and 4^, respectively. Observe that if A contains no 
parenthesized expressions, j* appears three times in a and all other symbols appear exactly twice. If there 
is a parenthesized expression, i.e., = or = = j*, then that symbol will appear three and fours 
times in a, respectively. 

Suppose that a ^ S. Let Vj^i be the vertex in T such that S,{vj^i) contains the ith occurrence of j 
in a. In order to show that this leads to a contradiction we first need to show that the induced sub- 
tree of T tree connecting {vj^i} mimics the structure of rx. Assume inductively that Vjj^^2 is ancestral to 
'^ji+i,!' • • • ^^jk+iA- Following the 2nd occurrence of jk in a we see the first occurrence of some sym- 
bols that must include j/c+2, followed by the second occurrence of jk-\-i- Since '^j^+i,! < (trivial) 
and '^jfc^i,! < ^jk,2 (ind. hyp.), it follows from the construction of S that Vjj^^2 O '^jk+i,2- This implies, by 
the inductive hypothesis, that . . . , '^^■^^2,1 descendants of Vj^^^^2' Furthermore, 'Uj^^i, . . . , 'U^* are 

descendants of Vj^^2' In other words, the symbol ji (corresponding to the 9 in A) functions as intended: by 
binding v-^i, , v-^i, , and 'U.* 1 under a common ancestor. The next step is to show that functions as a 
guard. 

The symbols following the second occurrence of j* in a a re j ^^jf^jti and j* in that order, where 
is omitted if = j* (as in cases (2) and (8) of Definition 4.3 ) First consider the case where j^, j*, 
and j2 are distinct. Regar dless of the compact encoding A, it alwa ys h olds that j^j^j^j^ and j'^J^j^ 
appear in a; see Definition 



4.4 



^i,ii). Thus, it follows from Lemma 3.3 that < j* and < j^. Since 
^(f 2) appears in S in decreasing order (by <), it must be that '^j*,2 ^ ^jf 2- This also implies that 
VjJi, 2 <l 2 — ^j*,3- The preconditions of Lemma 3.4 are satisfied (with respect to both of the pairs j*, jf 



and j*, j^), implying that jf and appear in ^('^j*,2)- Therefore, if a^S then jf j^jf j^jf as well, 
contradicting Lemma [331 

If = j* = j2 or = j2 (the cases where A contains (O^A) and (0^) respectively) the above 
proof goes though with somewhat simpler arguments, if = then v-^ ^ ^ ^jf 3 (this is trivial) and 



by Lemma 3.4, jf appears in ^('^j*,2)- Thus, if a^S then j^jf j^jf as well. The cases where 
jO = j* = are treated similarly. □ 
A natural question is whether there is any redundancy in ^ . That is, if a, a G ^ and a is a strictly shorter 



subsequence of a, then the superlinearity of Ex((j, n) (Theorem 5.1) immediately implies the superlinearity 
of Ex(cr, n). If the superlinearity of ^ could be deduced from a strict subset (perhaps even a finite subset), 
this would undermine the claim that the infinitude of ^ is evidence for the infinitude of <I>. Theorem 15.21 
shows that ^ is minimal in the sense that every strict subsequence of an element in ^ does appear as a 
subsequence of S. 



Theorem 5.2 For any a , if a ^ a and a ^ a then a ^ S. 



Proof: There are seventeen prototypes and, unfortunately, we have no elegant way to capture all of them 
with a single argument. Note, however, that prototypes (1-6) are fixed length and can be checked by hand. 
Prototype (7) is an oddball; however, it is simple to handle given what follows. The bulk of the proof covers 
cases (10-14, 16-17), which are the remaining ones that contain no parenthesized expressions. We sketch 
how to handle prototypes (8-9, 15) at the end. 

Let A be the compact encoding for a. Let ji, . . . , be the indices in A of the ^s and the 9; let j^, jf , j^, 
and j* be the indices of their respective types and assume for the moment that all these indices are distinct, 
i.e., assume there is no (0^) or (O^A) in A. Let a be a strict subsequence of a. If a is missing a symbol 
corresponding to a ^, 0, ^7 or d|b we can assume without loss of generality that the other occurrence of that 
symbol is missing as well. 

First consider the case where a is missing both occurrences of jVi, for some 1 < h < 1. We show that 
the ^ element no longer functions as a binder. As before, let vj^i G T be the vertex corresponding to the 
ith occurrence of j in the purported appearance of a ^ S. Unless explicitly contradicted, assume that (i) 
Vj^i is a leaf of T, (ii) all Vj^i are distinct, and (iii) Vj^i <] Vjf^if only if this relation must hold, given a. (For 
instance, if a = abab^ (iii) would require that Va,2 < ^b,2-) Let f be a tree on the vertices {vj^i} modeling 
the necessarily the ancestor /descendant relationships of (iii). In order to prove the theorem we only need 
to assign the vj^i to levels in T that is consistent with f and the construction of path compressions from 
Section [SI 
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Let a = cTi (j2 j/i-i 0-3 j^+i <J4 j/i (J5 (76, where the refers to its second occurrence in a. If h = I 
then j^+i refers to j* and if = 1, does not exist. Thus, a = ai (72 jh-i o's jh+i crsj^ (Tq. A case-by- 



case check of Definition |4.3| (or the more readable Definition 4.4) shows that the alphabets of the sequences 
c'"! ^'"2 j/i-i, era, and j/i+i (74 (75 are mutually disjoint. Furthermore, the first and second occurrences of j* 
appear in jh+i c"4 cts and jf does not. Thus, in f, the least common ancestor of v-^i^ ^ and '^^♦,2 appears at 
or above Vj<> 2- Figure 10 illustrates the difference between r and f on a specific example. Suppose that 




Figure 10: Left: the model tree corresponding to ahcdaedf eg fhigjijhhcj . Right: the model tree correspond- 
ing to abcdaedeghigjijbjcj^ where '/' is missing. 

T = T{k^i)^ so all leaves (of T and f) are /c-nodes. Let VjX^2 be a (/c — l)-node. For all compressions j 
appearing in ai (72 jh-i c^s, let the {k — l)-node of compression j lie at the same level as Vj*,2 and lie between 
Vj^i and the parent of vj^i in f. See Figure 10 for a schematic of how nodes in f are assigned to levels in 



T. For each such j, Vj^2 is a (/c — 2)-node and '^^•♦,3 is a (/c — 2)-node. If and/or appear in (74 (75 then 
(by Lemma 3.4), j^, appear in ^('^^•♦,2) as well. Figure 10 gives an example of this situation: h = 



appears in ^('^^,2) = ^^('^j*,2)- this case '^^♦,2 is the {k — l)-node of compressions j^^jf and Vj0^2 and 
v.4> 2 are the {k — 2)-nodes in compressions and j^. For other compressions j appearing in (74 (75 (^s and 



32 

^), the nodes Vj^2 are {k — 1) nodes at distinct levels in T. (In Figure 10 g and i are in this category.) It 
is clear, i.e., as clear as anything else in this proof, that the existence of these compressions could not cause 
any contradictions since they are effectively sequestered from the others; we ignore them in the arguments 
below. 

Recapping the above, all compressions in (7i (72 j/i-i (73, j^, j^, j* have their {k — l)-nodes at the same 
level and all these nodes are distinct, except possibly for and j^, whose (/c — l)-nodes may be equal to that 
of j*. For any {k — l)-node x there is exactly one compression containing this node and each {k — 2)-node 
ancestor of x that lies below the first {k — l)-node ancestor of x. Thus, so long as the {k — l)-nodes of 
compressions are distinct, any permutation of them in the sequence J^ctq is consistent with our construction 
of path compressions. We only need to worry about the compressions that may share a common (/c — l)-node. 
Since < < j* and these compressions appear in sorted order in J^ctq^ i.e., j* terminates above j^, 
which terminates above j^, these compressions are also consistent with our construction. 

Consider now the case when a is missing both occurrences of j^. Without the guard there is no contra- 
diction in letting Vj^ ^ — '^jf 2 ~ ^j*,2- ^^^^ every compression j we let v^^2 be the (k — l)-node of j and let 
'Uj* 3 be the (/c — 2)-node of j*. All the (/c — l)-nodes are distinct, with the exception of v-4i> ^ — 2 ~ ^j*,2- 
The same observation made above shows that all compressions with distinct (/c — l)-nodes are consistent with 
our construction, independent of the locations of their corresponding appearances in a. Since < < j* 
It is also consistent with our construction of S that ^('^^•♦,2) list jf , j^, and j* in decreasing order. 

The cases where a is missing or are similar to the case of above. If is missing there is no 
inconsistency in letting Vj*^2 (^ ~ l)-node in compression jf and Vj^ ^ ~ 2)-node in jf. As 

before, Vj*,3 is the {k — 2)-node of j* and all other nodes and their labels are the same as if were missing. 
Since jf < j* there is no inconsistency in letting ^jf, and j* appear in ^('^^♦,2) i^^ decreasing order. 
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Suppose that one of the three occurrences of j* is missing. There are now exactly two occurrences of 
each symbol in a. For each j let v^^\ be the /c-node of compression j and v^^2 be its (k — l)-node. We put all 
the /c-nodes at the same level in T whereas all of the (k — l)-nodes are at different levels. By our construction 
of path compressions there exists exactly one compression that includes Vj^\ and a particular (k — l)-node 
ancestor. Thus, regardless of arrangement of symbols in a it is always possible to assign, in a consistent 
fashion, symbols to path compressions. (Note that the proof of this case goes through if all symbols appear 
in G in at most two runs, e.g., abbcccccbbaaccc has this property.) 

Extending the proof above to cases where A may contain elements (0^) or is not difficult. In the 

first case the symbol corresponding to (0^) appears three times. For instance, if A = (prototype 
(3)) the forbidden subsequence a = abcdadccbd contains three occurrences of c, corresponding to the (0^); 
see Figure [H The first and third occurrences serve the role of a trapped element and the first and second serve 
the role of a guard. If a is missing the second or third occurrence we use the same analysis as above, as if 
were missing. If a is missing the first occurrence then the remaining two are consecutive in a; clearly their 
presence could not assist in obtaining a contradiction. The other cases, where a is missing a ^, 9, or are 
handled in the same way as before. Turning to the case where A contains (<C>^J|t), the corresponding symbol 
appears four times. For instance, the forbidden subsequence corresponding to ^d|b(O^A) is a = abcaccbc^ 
where the first, second, and fourth occurrences serve the role as a trap and the first and third occurrences 
serve as both a guard and a trapped element. If a is missing the second or third occurrence we use the 
earlier analysis as if were missing. If a is missing the first or fourth occurrence we use the analysis as if 
j* were missing. 

The arguments above cover all prototypes except (1) and (7), which are easy exercises. 

n 



Theorems 5.1 and 5.2 together show that there are an infinite number of superlinear forbidden subse- 
quences, each of which is not a subsequence of any other. With the exception of ababa we cannot say that 
any particular member of ^ is in However, we can show that |$| > 5 non-constructively. Recall that the 
previous bound of |<l>| > 2 [17 followed from the superlinearity of a' = ababa and a" = abcbadadbcd. Clearly 
a' is a palindrome (a sequence isomorphic to its reversal) and a" is not. If a" were in ^ its reversal would 
be there as well. However, since a^' contains a palindrome as a subsequence, namely abadadbd, we can only 
conclude |$| > 2. 

Theorem 5.3 |<l>| > 5. 

Proof: Klazar and Valtr [19] showed that any forbidden subsequence a over three letters has Ex(cr, n) = 0{n) 
unless one of ai = ababa, a2 = abcacbc^as = abcbcac^a^ = abacabc^a^ = abacacb is a subsequence of 
<j. Notice that ai is the only palindrome and that (72 and as are isomorphic to the reversals of a 4^ and 
(75, respectively. Let 7r2 = abcaccbc and tts = abcadcddbd be the prototypical forbidden subsequences 
corresponding to Definition |4.3[ 2,8); see Figure [sj the second and fifth diagrams on the first row. One can 
check that (72 ^ 7T2 but (7 1, (73, (74, (75 7^ 7r2, and that (73 ^ tts but (7i , (72 , (74 , (75 7^ TTg. Thus, either 7T2 (and its 
reversal) are in ^ or (72 (and its reversal (74) are in ^. Similarly, some a' tts (and its reversal) must be in 
Notice that all palindrome subsequences of tts are linear, e.g., abab, bccb, bdddb, bcdcb. This implies that 
(7^ is not isomorphic to its reversal and that it is distinct from 7r2,(72, and (74. Thus, $ contains at least five 
elements: ababa and four distinct subsequences of 112 and tts and their reversals. □ 



6 Discussion 

It is reasonable to think, for no reasons except those aesthetic, that ^ is infinite, i.e., that there are infinitely 
many causes of superlinearity in generalized DS sequences In this paper we have exhibited the first 
candidate for namely ^, which seems to capture the forbidden structures of the "standard" sequence of 
n path compressions with length 0(na(n)). There are numerous reasons to think that, even if $ is infinite, 
that it does not contain ^ and that it may only share ababa in common with ^. However, as we argue below, 
some of the obvious objections to the plausibility of ^ C <I> are not as grounded as one might expect. 

One immediate objection is that all cr G ^ have Ex((7, n) = Q{na{n)). Even if ^ did fully characterize 
those forbidden subsequences with superlinear growth Q{na{n)), why assume that the superlinear spectrum 

^ After 1^1 = 1 is excluded no other cardinality seems quite right. 
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between u;{n) is o{na{n)) is empty? There's no good response to this objection, except that functions 
o{na{n)) are exceedingly rare. We are not aware of any natural phenomenon that induces a function 
o(na(n)), though it is possible to manufacture such functions. Loebl and Nesetril [22 defined a specialized 
sequence of path compressions whose length is roughly ne(n), where e is the inverse of the quickly growing 
function corresponding to the ordinal eo0 

It is not farfetched to assume that Ex(cr, n) is either 0{n) or Q{na{n)) for any forbidden subsequence a. 
Even so, the superlinearity of forbidden subsequences in ^ was established by looking at a specific process, 
namely path compressions, a specific sequence of path compressions, and a specific way of transcribing path 
compressions into our ultimate sequence S. Is there any reason to believe that E holds a privileged position 
in the world of generalized DS sequences, despite its apparently ad hoc construction? We think that S is 
somewhat special. This opinion is not grounded in our aesthetic judgement but historical precedent. 

Since Tarjan's discovery of the inverse- Ackermann function over 30 years ago we have seen it appear in a 
wide range of problems. The union-find data structure [25l [28l [30l [Til |20l [13] is certainly the most high profile 
example. Other examples include one dimensional range searching [33l[5l[9], range searching over trees [27 ( 129 1 
[SnilHl [111 |24] , lower envelopes [32l[2], searching monotone matrices [15l[T4], low diameter spanners [7], and, of 
course, Davenport-Schinzel sequences [T2l|2l[T8] and related problems concerning forbidden substructure [T8] . 
The fact that the inverse- Ackermann function shows up all over the place is not surprising. (One could rattle 
off a similar list for any other function.) The surprising part is that all the results above ultimately relate 
back to one combinatorial object, namely path compressions on balanced trees Connecting these problems 
to path compressions is not always direct or simple. To take one example, Klawe's superlinear lower bound 
on searching monotone matrices [M] is identified with the complexity of the lower envelope of line segments 
[32] . which is identified with {ababa)-fTee Davenport-Schinzel sequences, and generalized postordered path 
compressions ^ or, equivalently, standard path compressions on balanced trees. In the other examples 
cited above it is usually easier to convert the domain-specific combinatorial structure, say, a hard instance 
of one dimensional range searching [9], into a system of path compressions. Given this history it would be 
truly astounding if it were possible to prove a lower bound of Ex(<j, n) = Q{na{n)) using some combinatorial 
construction that was fundamentally unrelated to path compressions on balanced trees. 

Although we see path compressions as the canonical manifestation of the inverse- Ackermann function, 
we will not argue that S is the only sensible transcription of path compressions. // it is possible to prove 
that, say, Ex{abcacbc,n) = Q{na{n))^ the proof would likely use (implicitly or explicitly) the same sequence 
of path compressions from Section |3] or [25l[T2] but a completely different transcription method. 
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